A Survey of Genomic Traces Reveals a Common Sequencing
نویسندگان
چکیده
While it is widely held that an organism’s genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-toA). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets. Citation: Zaranek AW, Levanon EY, Zecharia T, Clegg T, Church GM (2010) A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing. PLoS Genet 6(5): e1000954. doi:10.1371/journal.pgen.1000954 Editor: Dirk Schübeler, Friedrich Miescher Institute for Biomedical Research, Switzerland Received August 20, 2009; Accepted April 15, 2010; Published May 20, 2010 Copyright: 2010 Zaranek et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: EYL was supported by the Machiah foundation. Funding came from National Human Genome Research Institute Centers of Excellence in Genomic Science grant to GMC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] (AWZ); [email protected] (EYL) . These authors contributed equally to this work.
منابع مشابه
A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trac...
متن کاملWhole Exome Sequencing Reveals a BSCL2 Mutation Causing Progressive Encephalopathy with Lipodystrophy (PELD) in an Iranian Pediatric Patient
Background: Progressive encephalopathy with or without lipodystrophy is a rare autosomal recessive childhood-onset seipin-associated neurodegenerative syndrome, leading to developmental regression of motor and cognitive skills. In this study, we introduce a patient with developmental regression and autism. The causative mutation was found by exome sequencing. Methods: The proband showed a gener...
متن کاملI-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies
The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...
متن کاملI-39: Exome Sequencing Reveals New Genes Involved in Human Infertility
Background - MaterialsAndMethods N;Results N;Conclusion N;
متن کاملWhole Exome Sequencing Reveals a XPNPEP3 Novel Mutation Causing Nephronophthisis in a Pediatric Patient
Background: Nephronophthisis (NPHP) is a progressive tubulointestinal kidney condition that demonstrates an AR inheritance pattern. Up to now, more than 20 various genes have been detected for NPHP, with NPHP1 as the first one detected. X-prolyl aminopeptidase 3 (XPNPEP3) mutation is related to NPHP-like 1 nephropathy and late onset NPHP. Methods: The proband (index patient) had polyuria, polyd...
متن کامل